91 research outputs found

    Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation

    Full text link
    Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance perception and immersion. However, the influence of eye height in pre-captured real environments has received less attention due to the difficulty of altering the perspective after finishing the capture process. To explore this influence, we first propose a pilot study that captures real environments with multiple eye heights and asks participants to judge the egocentric distances and immersion. If a significant influence is confirmed, an effective image-based approach to adapt pre-captured real-world environments to the user's eye height would be desirable. Motivated by the study, we propose a learning-based approach for synthesizing novel views for omnidirectional images with altered eye heights. This approach employs a multitask architecture that learns depth and semantic segmentation in two formats, and generates high-quality depth and semantic segmentation to facilitate the inpainting stage. With the improved omnidirectional-aware layered depth image, our approach synthesizes natural and realistic visuals for eye height adaptation. Quantitative and qualitative evaluation shows favorable results against state-of-the-art methods, and an extensive user study verifies improved perception and immersion for pre-captured real-world environments.Comment: 10 pages, 13 figures, 3 tables, submitted to ISMAR 202

    3D Reconstruction of Sculptures from Single Images via Unsupervised Domain Adaptation on Implicit Models

    Full text link
    Acquiring the virtual equivalent of exhibits, such as sculptures, in virtual reality (VR) museums, can be labour-intensive and sometimes infeasible. Deep learning based 3D reconstruction approaches allow us to recover 3D shapes from 2D observations, among which single-view-based approaches can reduce the need for human intervention and specialised equipment in acquiring 3D sculptures for VR museums. However, there exist two challenges when attempting to use the well-researched human reconstruction methods: limited data availability and domain shift. Considering sculptures are usually related to humans, we propose our unsupervised 3D domain adaptation method for adapting a single-view 3D implicit reconstruction model from the source (real-world humans) to the target (sculptures) domain. We have compared the generated shapes with other methods and conducted ablation studies as well as a user study to demonstrate the effectiveness of our adaptation method. We also deploy our results in a VR application

    On the Design Fundamentals of Diffusion Models: A Survey

    Full text link
    Diffusion models are generative models, which gradually add and remove noise to learn the underlying distribution of training data for data generation. The components of diffusion models have gained significant attention with many design choices proposed. Existing reviews have primarily focused on higher-level solutions, thereby covering less on the design fundamentals of components. This study seeks to address this gap by providing a comprehensive and coherent review on component-wise design choices in diffusion models. Specifically, we organize this review according to their three key components, namely the forward process, the reverse process, and the sampling procedure. This allows us to provide a fine-grained perspective of diffusion models, benefiting future studies in the analysis of individual components, the applicability of design choices, and the implementation of diffusion models

    Automatic Dance Generation System Considering Sign Language Information

    Get PDF
    In recent years, thanks to the development of 3DCG animation editing tools (e.g. MikuMikuDance), a lot of 3D character dance animation movies are created by amateur users. However it is very difficult to create choreography from scratch without any technical knowledge. Shiratori et al. [2006] produced the dance automatic generation system considering rhythm and intensity of dance motions. However each segment is selected randomly from database, so the generated dance motion has no linguistic or emotional meanings. Takano et al. [2010] produced a human motion generation system considering motion labels. However they use simple motion labels like “running” or “jump”, so they cannot generate motions that express emotions. In reality, professional dancers make choreography based on music features or lyrics in music, and express emotion or how they feel in music. In our work, we aim at generating more emotional dance motion easily. Therefore, we use linguistic information in lyrics, and generate dance motion. In this paper, we propose the system to generate the sign dance motion from continuous sign language motion based on lyrics of music. This system could help the deaf to listen to music as visualized music application

    Arbitrary view action recognition via transfer dictionary learning on synthetic training data

    Get PDF
    Human action recognition is an important problem in robotic vision. Traditional recognition algorithms usually require the knowledge of view angle, which is not always available in robotic applications such as active vision. In this paper, we propose a new framework to recognize actions with arbitrary views. A main feature of our algorithm is that view-invariance is learned from synthetic 2D and 3D training data using transfer dictionary learning. This guarantees the availability of training data, and removes the hassle of obtaining real world video in specific viewing angles. The result of the process is a dictionary that can project real world 2D video into a view-invariant sparse representation. This facilitates the training of a view-invariant classifier. Experimental results on the IXMAS and N-UCLA datasets show significant improvements over existing algorithms

    U3DS3^3: Unsupervised 3D Semantic Scene Segmentation

    Full text link
    Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However, it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully unsupervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS3^3, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes. To achieve this, U3DS3^3 leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leveraging the invariance and equivariance of the volumetric representations, we apply the geometric transformation on voxelized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 202
    corecore